Multi-defect risk assessment in high-speed rail subgrade infrastructure in China

This study addresses the escalating risk of high-speed railway (HSR) infrastructure in China, amplified by climate warming, increased rainfall, frequent extreme weather, and geohazard events. Leveraging a georeferenced dataset of recent HSR defects obtained through an extensive literature review, we employ machine learning techniques for a quantitative multi-defect risk assessment. Climatic, geomorphological, geohydrological, and anthropogenic variables influencing HSR subgrade safety are identified and ranked. Climatic factors significantly impact frost damage and mud pumping, while geomorphological variables exhibit greater influence on settlement and uplift deformation defects. Notably, frost damage is prevalent in the northeast and northwest, mud pumping along the southeast coast, and settlement and uplift deformation in the northwest and central areas. The generated comprehensive risk map underscores high-risk zones, particularly the Menyuan Hui Autonomous and Minle County sections of the Lanzhou-Urumqi HSR, emphasizing the need for focused attention and preventive actions to mitigate potential losses and ensure operational continuity.


Historical HSR subgrade defect occurrences in China
We recently compiled an extensive georeferenced dataset of historical HSR subgrade 17 .The dataset was sourced from 24,735 peer-reviewed literature published from 1999 to 2022 in both Chinese and English, and a quality control procedure was applied to remove duplicates and ensure accuracy 18,19 .Subsequently, a total of 661 georeferenced event records of eight defect types were selected, crossing provincial, municipal, county, township, and smaller scales.Notably, subgrade settlement (settlement values ranging from 5 to 2300 mm), frost damage (frost heave values ranging from 4 to 50 mm), uplift deformation (ranging from 5 to 122 mm), and mud pumping exhibit the longest reporting history among the identified disease types.These definitions are detailed in Table 1.The distribution of HSR subgrade defect records across Chinese prefectural-level administrative regions is illustrated in Fig. 1.
The results indicate that the occurrence of these defects can be closely related to local climate and geological environment.For example, frost damage events are concentrated in the temperate zone of China, which is characterized by long and cold winters and high humidity throughout the year.The presence of pore water in the soil particles in the subgrade freezes and forms ice layers, resulting in soil displacement and subgrade frost heave.Mud pumping events are concentrated in the southeastern part of China, where frequent heavy rainfall occurs, causing a large amount of rainwater to infiltrate into the subbase and reduce its bearing stiffness.Under the high-frequency dynamic loads of trains, mud pumping and, in severe cases, subgrade settlement can occur.Subgrade swelling and upheaval are closely related to the slight expansion of the fill material used.Within the same climatic zone, multiple diseases often coexist, making the subgrade condition more complex.
Table 1.Definition of major types of HSR subgrade defect.

Subgrade settlement
Settlement refers to the vertical deformation that occurs over a small or extensive area due to inadequate compaction of subgrade soil, insufficient depth of foundation treatment, damage between piles, creep of underlying soil layers, or regional settling

Frost damage
In cold regions, subgrade and its protective structures experience uneven frost heave under low temperature conditions, leading to issues like tilting and cracking of protective structures

Mud pumping
This defect occurs in areas with poor drainage.Repeated vibrations from train traffic cause softening or thixotropic liquefaction of the sub-ballast, leading to the formation of mud slurry

Uplift deformation
This happens when expansive soils or rocks within the subgrade or its base react with external moisture, causing the subgrade to arch upwards www.nature.com/scientificreports/

Climate variables
Average annual rainfall: Rainfall may alter the engineering properties of subgrade materials, thereby influencing the stability of the subgrade 20 .
Consecutive 5-day rainfall: This data serves as an index reflecting extreme rainfall 21 .

Number of days with maximum temperature exceeding 35 degrees celsius:
This data can serve as an indicator reflecting extreme high temperatures 16 .
Annual freezing days: Annual freezing days quantify the number of days in a region where water freezes, and it is a key factor influencing the occurrence of frost damage on roadbeds 15 .
Wind speed: Strong winds may erode road shoulders, leading to a reduction in subgrade width, with sleepers/ track panels exposed, thereby affecting the stability of the railway track 22 .

Geomorphological variables
Elevation: Elevation defines the highest and lowest points within a region and is reported to relate to the occurrence of various defects, such as, a number of defects have been reported on the Menyuan-Minle section of the Lanzhou-Urumqi HSR at high altitude 15 .
Slope and aspect: HSR subgrades may have varying slopes, resulting in different temperatures inside and outside the subgrade, potentially leading to uneven settlement 23,24 .The slope gradient may have an impact on the flow of moisture, thereby disrupting the drainage of the subgrade 25 .

Geohydrological variables
Rock hardness: Harder rocks can provide better support for HSR subgrade 3 .
Distance to fault: Geological faults provide pathways for groundwater and surface precipitation, which can affect subgrade 26 .
Soil texture: Subgrade defects can be associated with the types and properties of surrounding soil 27 .
Average distance to river: The presence of rivers increases the amount of groundwater in the surrounding geological environment, thus affecting the performance of subgrade 28 .
.Average distance to road: Road construction, as a human activity, can have an impact on railway lines 30,31 .

Variable sources and preparation
The average annual rainfall, consecutive

Data processing
The variables were standardized using the StandardScaler module, and the hyperparameters of the RF were optimized using grid search to build a screening model 32 .To streamline and enhance model performance, the recursive feature elimination method was used to remove the environmental variables with minimal contribution 33,34 .Specifically, a RF model was iteratively established 18 times, eliminating the least important environmental factors in each screening process based on their contribution.A criterion was set to prevent the incorrect elimination of important factors, ensuring that the contribution of the eliminated factors did not exceed 0.005.The adjusted remaining predictor factors were reintroduced into the model.Finally, 55 factors, out of the initial 73, for each type of defect were retained to construct the risk prediction model.All the factors are shown in Table 2.

Random forest modelling
The RF model is one of the most commonly used integrated algorithms in applied Machine Learning studies 35,36 .It utilizes repeated independent sampling to extract multiple samples from the original dataset and constructs decision trees for each sample.These decision trees are then aggregated and combined by voting, taking each decision tree as a member to achieve classification and prediction.In this study, the Random Forest algorithm emerges as a crucial tool in predicting the risk of subgrade defects in HSR infrastructure.Its capacity to process extensive datasets with various input variables and is robust against overfitting make it exceptionally suited for this task.Furthermore, as a non-parametric model, RF does not require assumptions about any specific form of relationship between variables, offering a significant advantage in examining the complex and not yet fully understood interplay between environmental factors and subgrade defects.Applying RF allows us to capture nonlinear relationships and variable interactions that traditional statistical methods might overlook.Finally, RF is widely recognized and effective in identifying and determining variable importance.As a result, this approach has been successfully applied in the past for mapping landslides, debris flows, and many other types of disasters 28,29,37 .RF calculates the decrease in Gini index D Gk by evaluating the evaluation factor k during node splitting.The importance of the evaluation factor k is determined by summing up D Gk of all nodes in the forest and taking the average over all trees.This measure represents the percentage of the average decrease in Gini index for the evaluation factor in relation to the total average decrease in Gini index for all factors.It is calculated according to Eq. (1): where m, n, and l represent the total number of evaluation factors, the number of classification trees, and the number of nodes in a single tree, respectively.D Gkhj refers to the decrease in the Gini index of the jth node in the hth tree for the kth evaluation factor.P K denotes the importance level of the kth evaluation factor among all evaluation factors.
When constructing the RF models, the dataset was divided into a 7:3 ratio for training and validation.To enhance the robustness of model predictions and quantify the uncertainty, we employed an ensemble of 50 models trained on separate bootstraps of the dataset.The hyperparameters of each of the 50 individual models www.nature.com/scientificreports/were determined using grid search, with random combinations of parameters, while all other tuning parameters were set to their default values.The combination with the highest average accuracy across the models was selected as the optimal parameter choice for the model.Furthermore, a five-fold cross-validation strategy was employed, whereby the training dataset was divided into 5 equal subsets, with 4 subsets used for model training and the remaining subset utilized for testing.This five-fold process was repeated iteratively, rotating the testing subset, in order to fully leverage all the training data for model training and testing while mitigating the impact of overfitting.To minimize the influence of randomness, each type of pathology was subjected to 50 models.Each one of these 50 models predicted the environmental risk on a continuous scale ranging from 0 to 1, and the final prediction graph was generated by calculating the average prediction across all models.The model's classification accuracy is analyzed using the Receiver Operating Characteristic (ROC) curve [38][39][40] , depicting the true positive rate on the vertical axis and the false positive rate on the horizontal axis.Greater accuracy in model classification is indicated by a higher true positive rate and a lower false positive rate.The ROC curve is generated by plotting the true positive rate (proportion of correctly identified defect samples) against the false positive rate (proportion of falsely identified non-defect samples).

Integrated risk map generation
The integrated HSR infrastructure risk assessment involves a holistic analysis that encompasses multiple subgrade defects that are most commonly reported in China, such as settlement, frost damage, uplift deformation, and mud pumping.This approach takes into account the cumulative impact of various factors-including climatic conditions, geomorphological features, geohydrological characteristics, and human activities-on the subgrade's safety.In regions where the integrated risk scores are relatively high, an enhanced need for coordination and management emerges to effectively mitigate potential risk.
To quantify this integrated risk, we utilized the Random Forest (RF) model to evaluate the probability of each defect type occurring, averaging the outcomes across 50 iterations.Natural breakpoints were then utilized to divide each defect into four risk levels: low, low-medium, medium-high, and high 28,41,42 .Portions with average

Evaluation of model predictive power
This study employed RF to evaluate the susceptibility of road defects in China, verified the training accuracy (success rate) using its ROC curve.The average AUCs for subgrade settlement, frost damage, uplift deformation, and mud pumping were obtained through 50 rounds of sampling, with values of 0.76, 0.96, 0.80, and 0.81, respectively, as shown in the Fig. 2. The green line represents the average ROC curve, while the black lines represent the 50 individual ROC curves.These results demonstrated that the RF model exhibited good prediction capabilities for generating risk maps of subgrade defects.

Predicted high risk areas for different subgrade defect types
Predicted high risk areas for settlement defects mainly concentrate on the Lanzhou-Urumqi HSR and the Shanghai-Nanjing Intercity Railway, with higher susceptibility in northwest and central China (Fig. 3a).Frost damage risks (Fig. 3b) were predicted to primarily concentrate on the Harbin-Dalian HSR, the Lanzhou-Urumqi HSR, with higher susceptibility in northeastern and western China.Areas prone to uplift deformation (Fig. 3c) were predicted to mainly concentrate on the Lanzhou-Urumqi HSR, and to mud pumping defects (Fig. 3d) primarily concentrate on the Shanghai-Nanjing Intercity Railway and the Wuhan-Guangzhou HSR, with higher susceptibility in southeast China.
In the integrated risk map for subgrade defects in China's HSR (Fig. 4), the predominant occurrences of subgrade defects in China's HSR are concentrated in the northeast, northwest, and central regions.The

Key environmental drivers of subgrade defect risk
The occurrence of each defect is influenced by multiple influencing factors, each with varying degree of impact.Utilizing the "Gini coefficient" based on the RF model 43 , the average factor importance of 50 sets was calculated to generate the final factor importance ranking for each defect, as shown in the Fig. 5.We selected the top 10 most important factors for presentation.Regarding settlement defect, the importance factors included elevation, slope, and land use-bare rock, with importance values of 0.063, 0.044, and 0.041, respectively.For frost damage, the importance factors were number of freezing days per year, annual average rainfall, and continuous 5-day cumulative rainfall with importance values of 0.20, 0.082, and 0.076, respectively.For uplift deformation, the elevation, continuous 5-day cumulative rainfall, and land use-bare rock had importance values of 0.081, 0.048, and 0.047, respectively.For mud pumping, the driving factors were number of days with maximum temperature exceeding 35 degrees Celsius, annual average rainfall, and number of freezing days per year, with importance values of 0.077, 0.070, and 0.067, respectively.

Climatic impacts
Our results indicates that meteorologically variables have a significant impact on subgrade defects, particularly in frost damage and mud pumping.The analysis prioritizes the identification of the most influential meteorologically variables associated with each defect.Rainfall factors ranked among the top three in terms of importance, with the exception of settlement defects.The driving force behind the influence of rainfall on common subgrade defects lies in its capacity to increase the moisture content of the pavement soil.We have found that defects such as subgrade settlement, frost damage, mud pumping, and uplift deformation are intricately linked to the presence of water, which is consistent with the research results of many researchers 20,27,[44][45][46] .In the case of frost damage, soil moisture undergoes crystallization into ice, filling soil voids during temperature drops, resulting in relative displacement of the subgrade particles.Mud pumping could be influenced by the softening of pore water pressure in the subgrade under train loads, making it highly susceptible to pumping and subgrade softening.Uplift deformation is associated with the expansion of expansive rock and soil in the subgrade expands upon water absorption.Related research has shown a significant correlation between the vertical uplift deformation rate of the pavement and the amount of atmospheric rainfall.Furthermore, extreme precipitation, indicated by the rainfall amount over 5 consecutive days, could exceed subgrade drainage capacity, elevating soil moisture and heightening susceptibility to defects.
We found that the annual freezing days have the greatest impact on the frost damage of the subgrade, which is consistent with the indoor experiments of subgrade permafrost and numerical simulations 25 , because the annual freezing days are closely related to the freeze-thaw cycle of the subgrade, which can lead to the occurrence of frost damage.The variable classified frozen soil in subgrade into three categories: instantaneous frozen soil, seasonal frozen soil, and permafrost.Permafrost, due to its long-term exposure to cold areas, is often in a state of freezing expansion.Seasonal frozen soil experiences thawing and settlement in summer and freezing expansion in winter.The recurring cycle of freezing expansion and settlement poses a significant risk of HSR subgrade defects.The damage to the subgrade in regions with repeated occurrences of such frozen soil is notably higher than in areas with permafrost.Instantaneous frozen soil generally experiences less freezing expansion.The effect on mud pumping mainly stems from the fact that after the freezing and thawing of the subgrade soil.The water content in the soil takes various forms, including ice crystals and residual moisture, which leads to a decrease in the drainage capacity of the subgrade and makes it difficult to drain moisture effectively.Consequently, poor drainage can result in mud pumping in the subgrade.
Extreme heat, represented by the total number of days with a maximum daily temperature exceeding 35 degrees Celsius, could impact the HSR infrastructure.High temperature can cause the rubber material at the interjoint of the ballastless track slab to harden and fatigue, resulting in the detachment of the interjoint interface and unevenness in the track.With repeated cycles of high-temperature and low-temperature alternation, the rubber material may even fracture, contributing to mud pumping defects.

Geomorphological and geohydrological characteristics
Our results show that geomorphology had a significant impact on roadbed settlement and uplift deformation defects (Fig. 5), while the geohydrological factors showed comparatively less impact.This is inconsistent with the research findings of some researchers 11,47 , possibly because HSR has already avoided the risks caused by geohydrological variables during the design phase.On steeper slopes, especially during heavy rainfall, soil erosion is prone to occur on the subgrade surface.Soil erosion may accelerate the settlement process of the roadbed, affecting its stability.In areas with significant slopes, the speed of water flow may be higher, which could impact water infiltration and drainage.This may result in uneven distribution of moisture in the soil, thereby affecting the settlement behavior of the subgrade.Moreover, with increasing altitude, temperature, precipitation, and atmospheric pressure may undergo significant changes, thereby affecting the stability of the subgrade.From a topographic perspective, high-altitude areas, characterized by steep slopes and complex terrains, yield significant variations in local climates between foothills and hinterlands.Therefore, a more detailed analysis of the region and a precise subgrade risk assessment based on the local climate are necessary.
The microscopic properties of clay minerals contribute to their capability to absorb water molecules on their surfaces.Frost damage to the subgrade only occurs when the water in the soil reaches or exceeds a certain threshold, making clay more likely to cause frost damage than other soils.As for the uplift deformation of the subgrade, when clay absorbs water, its volume will increase and expand, which may lead to up-arching of the roadbed.www.nature.com/scientificreports/

Anthropogenic influence
Our results show that anthropogenic variables have a significant impact on various subgrade defects, with the analyses emphasizing the most influential anthropogenic variables associated with most defects.Urban land use signifies the extent of anthropogenic interference, which includes the extraction of groundwater, the construction of underground facilities, mining and so on.When groundwater is extracted, the water pressure in the soil changes and the pumped water carries away fine particles from the soil, resulting in settlement of the subgrade.For example, in the Jakarta and Bandung areas along the Jakarta-Bandung High-Speed Rail, industrial activities and rapid population growth have resulted in the extensive extraction of groundwater, causing significant land subsidence that severely affects the operation of the high-speed train 48 .For the uplift deformation defect, in the high-density urban area, a large number of buildings.The area of bare rock reflects the surface area of land consisting of rocks.The bare rock areas may have good characteristics for water infiltration and drainage, which can slow down soil settlement through drainage.In addition, the scarcity of soil moisture may prevent the swelling of weak mudstone, thus reducing the occurrence of roadbed expansion defects and positively affecting the stability of the subgrade.

Limitations and future improvements
Our study has several limitations that can be addressed in future research.Firstly, the defect data in this study are sourced from peer-reviewed literature, which ensures accuracy but may overlook some unreported defect data.
Secondly, selecting the model's hyperparameters poses significant challenges.The crucial parameters of the model are determined through trial and error using a network search method.If the search space is set inappropriately or potential solutions are overlooked, the optimal solution may not be found.Furthermore, while the Random Forest method is recognized for its strong predictive capability, it falls short in interpretability.Future research should consider employing specified analytical methods to further explore the casual relationships among various influencing factors and improve understanding of the mechanisms by which these factors impact high-speed rail infrastructure.Lastly, due to limited resources and capabilities, the selected influencing factors in this study may not be comprehensive.In future research, we can incorporate more reliable data, including media reports, government documents, and bidding information, to avoid overfitting caused by insufficient data.Additionally, we can explore methods for optimizing the model's parameters and include HSR attribute factors to further enhance the model's accuracy.

Potential for application
This study applies a robust and effective machine-learning method for assessing the diverse defect risks inherent in China's high-speed railway infrastructure.The practicality of the Random Forest method is not limited to specific geographic regions or infrastructure types; its powerful data processing capability and the ability to identify complex relationships between environmental factors grant it broad application potential.For instance, it can accommodate adjustments in environmental variables, such as rainfall and temperature variation, to suit various climatic zones (e.g., tropical, temperate, polar).Furthermore, this method can be applied to datasets for different infrastructures including roads, bridges, and tunnels, taking into account their unique risk factors and challenges.By fine-tuning the inputs to the algorithm, it is possible to precisely predict the specific risks faced by these different infrastructures, thereby providing a scientific basis for the design, construction, and maintenance of infrastructure.

Policy recommendations
Global climate change, marked by temperature increases, intensified precipitation, and extreme events, threatens HSR safety and reliability, affecting infrastructure and surrounding environments [49][50][51] .Particularly vulnerable regions like Minle County and Menyuan Hui Autonomous County (Fig. 4), with seasonal frozen soils, face heightened subgrade defects due to disrupted thermal equilibrium.To address these concerns, several policy recommendations are proposed.First, research and development efforts for HSR infrastructure should be intensified, focusing on enhancing resilience to climate change through developing materials and technologies that can withstand extreme weather conditions.Real-time monitoring and early warning of the seasonal frozen soil environment in the regions housing specific HSR projects, such as Lanzhou-Urumqi and Harbin-Dalian HSRs, should be strengthened.This aims to timely grasp the changes in the frozen soil environment and provide scientific basis for safe and stable operation of HSR projects.In the Far Eastern Railway in Russia, long-term monitoring of subgrade deformation, weather, and rock layers on railway sections located in permafrost areas has been implemented to mitigate the effects of extreme atmospheric precipitation 52 .
Moreover, the design standards of HSR projects should be revised to accommodate frozen soil climate conditions.Construction processes and methods should be optimized to ensure the safety and reliability of HSR projects in frozen soil areas during construction and operation.Emergency response plans and risk assessment systems for HSR must be established and enhanced in response to climate change.This includes augmenting early warning and response capabilities for extreme weather events, and effectively respond to the sudden risks brought about by climate change.Similarly, in Norway, a preparedness framework has been developed to assess and manage natural climate risks, aiming to reduce railway vulnerability and enhance resilience against the negative impacts of climate change.This includes emergency plans for trains include speed restrictions in highrisk areas and providing alternative transportation methods when tracks are obstructed 53 .
Lastly, strengthening safety promotion in areas along the HSR line should be emphasized.The government should fully utilize online methods such as government websites, television broadcasting, and new media, as well as offline methods such as home visits and setting up prominent warning signs, to proactively promote policies www.nature.com/scientificreports/and regulations related to protecting the safety environment along the HSR line and reducing anthropogenic interference with HSR safety.In Sweden, particularly regarding the Varberg Railway, a study highlighted that human-induced groundwater extraction increases the risk of railway subsidence, suggesting the need for enhanced safety management measures along the railway lines 54 .

Conclusions
This study quantitatively assesses the multi-subgrade defect risk in China's HSR infrastructure, utilizing machine learning and historical defect occurrence data.Key environmental factors influencing subgrade defects, such as rainfall, freezing days, extreme temperature, land use, slope, and altitude, are identified, providing valuable insights for HSR planning.Furthermore, spatial analysis further reveals the distribution characteristics of different defects across various regions in China, particularly pointing out high-risk areas like the Menyuan Hui Autonomous and Minle County sections of the Lanzhou-Urumqi HSR, which require increased attention and preventative measures to minimize potential losses and ensure operational continuity.
For high-risk areas and types of defects, we recommend intensifying R&D efforts for HSR projects to develop materials and technologies capable of withstanding extreme weather conditions; optimizing design standards and construction methods for HSR projects, especially under permafrost climate conditions; establishing and improving emergency response plans and risk assessment systems for HSR to address sudden risks posed by climate change; and enhancing safety promotion along HSR lines to reduce human interference and ensure the safe and stable operation of HSR.While focused on China's HSR, the methods are adaptable to railway infrastructure risk assessment globally, with challenges remaining in incorporating engineering design characteristics and evolving climate change impacts.Further research is needed to address these challenges.

Figure 3 .
Figure 3. Predicted risk distribution of main HSR subgrade defects in China: (a) subgrade settlement, (b) frost damage, (c) uplift deformation, and (d) mud pumping.Mean are shown for each ensemble of 50 RF models.

Figure 4 .
Figure 4. Integrated co-occurrence risk map of HSR subgrade defects in China.

Figure 5 .
Figure 5. Importance map of defect factors of HSR subgrade foundations in China (a-d, subgrade settlement, Frost damage, uplift deformation, and mud pumping). https://doi.org/10.1038/s41598-024-56234-8 Land use indirectly influences the occurrence of subgrade defects.Extracting groundwater in urban areas can lead to subgrade defects, while areas with multiple rock types can enhance the strength of subgrade and reduce settlement Vol:.(1234567890) Scientific Reports | (2024) 14:5487 | https://doi.org/10.1038/s41598-024-56234-8www.nature.com/scientificreports/ 5-day rainfall, number of days with maximum temperature exceeding 35 degrees Celsius, annual freezing days, and wind speed data were sourced from the National Earth System

Table 2 .
Detailed description of factors.For all soil units, the reference depth is typically set at 100 cm.However, for Rendzinas and Rankers in the FAO-74 classification, and for Leptosols in the FAO-90 classification, the reference soil depth is set at 30 cm; for Lithosols in FAO-74 and Lithic Leptosols in FAO-90, the reference depth is set at 10 cm, and also at 0 cm probability values greater than 0.6 for each defect were selected and assigned a value of 1; otherwise, they were assigned 0. Spatial coupling of the four defects was performed to produce a comprehensive risk map of railway subgrade defects in China The low, medium, high, and very high risk areas in the graph have values of 0, 1, 2, and 3, respectively, representing the risk level of the area.).It is noteworthy that this map displays regions with high risks for all four defects (probability values greater than 0.6), thus necessitating extra attention in HSR operations and new HSR planning.All distribution maps in the figure were drawn by ArcGIS (v10.7,www.esri.com).